Contrastive Self-Supervised Learning for Optical Music Recognition


Authors

Carlos PeƱarrubia (University of Alicante)*; Jose J. Valero-Mas (University of Alicante); Jorge Calvo-Zaragoza (University of Alicante)
carlos.penarrubia@ua.es*; jjvalero@dlsi.ua.es; jcalvo@dlsi.ua.es

Abstract

Optical Music Recognition (OMR) is the research area focused on transcribing images of musical scores. In recent years, this field has seen great development thanks to the emergence of Deep Learning. However, these types of solutions require large volumes of labeled data. To alleviate this problem, Contrastive Self-Supervised Learning (SSL) has emerged as a paradigm that leverages large amounts of unlabeled data to train neural networks, yielding meaningful and robust representations. In this work, we explore its first application to the field of OMR. By utilizing three datasets that represent the heterogeneity of musical scores in notations and graphic styles, and through multiple evaluation protocols, we demonstrate that contrastive SSL delivers promising results, significantly reducing data scarcity challenges in OMR. To the best of our knowledge, this is the first study that integrates these two fields. We hope this research serves as a baseline and stimulates further exploration.